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Qh. We propose an efficient numerical algorithm for solving integral equations of the theory of liq- 

■ uids in the Reference Interaction Site Model (RISM) approximation for infinitely dilute solution of 

macromolecules with a large number of atoms. The algorithm is based on applying the nonstationary 
iterative methods for solving systems of linear algebraic equations. We calculate the solvent-solute 
atom-atom correlation functions for a fragment of the B-DNA duplex d(GGGGG)-d(CCCCC) in 
infinitely dilute aqueous solution. The obtained results are compared with available experimental 
data and results from computer simulations. 

o 

CO 



> 



- 1—1 

X 



I. INTRODUCTION 



Interactions of nucleic acids with proteins, lipids and other ligands in many respects are determined by the solvation 
properties of these biomolecules. The structure of the DNA molecule itself in the cytozolic environment of a cell is 
strongly dependent on its hydration. Information about the structure of water shell around many biomolecules is 
available primarily from X-ray and neutron scattering experimental on their crystalline forms, as well as from NMR 
^ i ■ studieso of such molecules in solution. n 

There is a large number of reviews devoted to the problem of biopolymer hydrationta. One of common approaches to 
these difficult problems is based on computer simulations using either Molecular Dynamics or Monte Carlo methodsaQ. 
The main limitation of these direct simulation techniques is in the huge computational expense for reaching the equi- 
librium state even for comparatively small molecular weights. A promising alternative for addressing the equilibrium 
issues is based on the methods of statisticaL-naechanics and, particularly, on the method of integral equations of the 
theory of liquids commonly known as RISMErQ. The main advantage of the method based on the Reference Interac- 
f*^S 1 tion Site Model (RISM) equations is that here one explicitly deals with all quantities averaged over the equilibrium 
Gibbs distribution. This results in a considerable reduction of computational expenses for many problems compared 
■ to those of direct simulation methods. An example of application of the RISM technique to hydration of biomolecules 
is the calculation of the alanine dipeptide carried out in Ref. |l^. 

Practically, in the limit of infinite dilution the RISM method requires numerical solution of a system of inte- 
gral equations for the solute-solvent pair correlation functions of order proportional to the number of atoms in a 
^ , macromolecule. Here the solvent-solvent correlation functions can be found beforehand and the conformation of a 
macromolecule is assumed fixed. We note that memory requirements for keeping the matrix of unknown variables 
could be quite considerable — a few tepabytes of RAM or more even for a DNA duplex of only 5 bases. 

A number of coarse-grained modela!3[L3 for complicated macromolecules have been suggested to eliminate this 
difficulty in the RISM method. For instance, in Refs. [ll] the solvation of model DNA structures has been investigated 
using such coarse-graining with each nucleotide pair being approximated by a potential centre and the solvent by 
a binary mixture of charged particles. Similar coarse-grained models may be of interest for addressing non-specific 
hydration effects, which are fairly independent of the detailed molecular structure and thus universal for a given 
class of biomolecules. In Ref. [l3] the hydration of a linear nonpolar chain in aqueous solution has been considered 
to elucidate the general behaviour of hydration of long macromolecular chains. Unfortunately, errors resulting from 
a coarse-graining procedure are practically impossible to estimate and therefore such models seem to be too remote 
from the complexities of the real DNA molecule. 

Here we suggest an efficient numerical algorithm for solving the RISM equations that makes it feasible to study 
a detailed model for short dupleces of DNA in water in a quite moderate computational time, even on a PC and, 
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most importantly, avoiding the necessity to deal explicitly with huge matrices. The main idea of this algorithm is 
to use the Newton-Raphson scheme with a specially appromixated Jacobi matrix with consequent application of the 
nonstationary iterative methods for solving large systems of linear equations. A review of modern developments in 
the latter field, on which our approach heavily relies, may be found in Ref. [h| 

In this work we describe the new algorithm for calculation of the solvent-solute correlation functions of a macro- 
molecule in aqueous solution. This is then applied to study of the hydration of B-DNA dupleces. 



II. BASIC EQUATIONS OF THE RISM THEORY 

Various integral equation techniques have proved to be successful for a wide range of applications from simple 
monoatomic liquids to complex multicomponent mixtures of molecular liquids. Generally, the correlation functions 
satisfy the infinite chain of Bogoliubov-Born-Green-Kirkwood-Yvon (BBGKY) integro-differential equations. To 
achieve a practical progress one possibility is to apply a closure relation expressing high-order correlation functions 
in terms of a few low-order ones. Many different closure relations have been suggested, each having its own region 
of validity, which can be established by comparing the results of calculations with available data from experiment or 
computer simulations. 

Another particularly fruitful strategy is based on the Ornstein-Zernike (OZ) equation, which relates the direct, 
C(fc), and the total, h(k), correlation functions in the Fourier space, 

(1 - pC(k))(l + p~h(k)) = 1, (1) 

where p is the density of the liquid. Here the original and the Fourier conjugated functions, the latter being designated 
by the tilde, are related as follows, 

47r f°° f°° sm(kr) 
x(k) = — / rdr sm(kr) x(r), x(r) = / kdk ^ — i(k). (2) 

The total correlation function is defined in such a way that h(r) + 1 is the probability density to find an atom at the 
distance r from another one. Eq. (|l|) is an exact integral equation between h(r) and the direct correlation function, 
C(r), which can be established based on the diagrammatic cluster expansion. This has a simple physical meaning. 
C(R) is said to be caused by the two-body interaction potential between a pair of atoms, it(r), whereas the indirect 
correlation function, j(r) — h(r) — C(r), contains contributions of all other atoms. 

The direct correlation function also has a diagrammatic representation, which can be viewed as the exact closure 
relation between the two unknown functions. However, one would like to express such a relation in an explicit form 
which is popularly written as, 

C(r) = exp(-u(r)/fc B T + 7 (r) + B[ 7 (r)]) - 7 (r) - 1, (3) 

where fee is the Boltzmann constant, T is the absolute temperature and £>[7(r)] is called the bridge functional. 
Unfortunately, no strict analytical expression can be found for the latter. 

Therefore, in practice, one has to revert Xo some kind of an approximate closure relation inspired by partial 
resummations of the diagrammatic expansions. Let us mention here two of the most widely used relations, 

B[ 7 (r)} = 0, (4) 
£[7(r)]=m(l+7(r))- 7 (r), (5) 

which are called the hyper-netted chain (HNC) and the Perkus-Yevick (PY) closures respectively. It is well established 
that, generally, the approximation (^) is more accurate for long-range potentials, whilst approximation (^) works 
better for short-range potentials. 

Generalisation of the OZ equation to multicomponent mixtures of molecular liquids seems to be relatively straight- 
forward. The conformation of a molecule is assumed to be fixed, so that the correlation function between the i-th and 
j-th sites (atoms) inside the same molecule are given by Vij(r) — 5(r — rij)/4irr 2 , where r*y- are the known interatomic 
distances. In the Fourier space the analogue of Eq. (Q) can be written in the matrix form, 

(v(k)+ph(k)) (v(k)- 1 - pc(k)) = I (6) 

where v is a block diagonal matrix with the number of blocks equal to the number of distinct molecules and the 
elements of each block are equal to Wy(fc) = a\n{krij) /krij . The matrix p contains the partial atomic densities of the 
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mixture and possesses the same block structure as the matrix v. The form (||) is often called the Site-Site Ornstein- 
Zernike equation (SSOZ), and jointly with a particular closure relation (0) such as e.g. HNC or PY is loosely referred 
to as the RISM theory. We shall favour this terminology for historical reasons despite it may not be the most logical 
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The important point is, however, that the relation of these equations to the diagrammatic expansion is no longer 
preserved in the site-site version of the formalism. So. in a sense, the SSOZ equation may be simply viewed as a formal 
definition of the site-site direct correlation functional, with no particular deep meaning attached to the analogues of 
HNC or PY closures. The shortcomings of the RISM technique, that are possibly related to this circumstance, are 
well known and include the problems with the long-ranged orientational correlations and the equation of state (to 
discussion of these problems and attempts to improve the RISM theory we refer the reader to standard reviews such 
as e.g. ^). Despite these deficiencies, we believe that the RISM remains a very useful tool for studying hydration of 
complex biomoleculest-51, where .tup other method can really compare to it in computational power. However, based on 
the standard techniques in Ref.E^I it was possible to consider the system of 413 atoms only thanks to using a powerful 
supercomputer. 

For a two-component molecular solution let us denote by p the density of the solvent and by p u the density of the 
solute. In the limit of infinite dilution, p u /p — * 0, Eq. is decoupled, 

f = v v c v v v (1 - pc^V'y 1 ~ c v (k), (7) 
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(v v +ph v ^j - c uv , (8) 



where the superscripts v and u refer to the solvent and the solute respectively. 

If we are interested in hydration of a Y-atomic molecule, Eq. (||) for the solute-solvent correlation functions of all 
atoms in the molecule with the O and H atoms of water may be rewritten in the Fourier space as follows, 

G = (V ® W-l 2N )C, (9) 

where G and C are vectors of size 2Y containing the correlation functions, 

G T = [jm(k), . . . , TjvhW, 7io0), ■ ■ ■ , lNo(k)] , (10) 
C T = [cm(fc), . . . , CATH(fc), cio(fc), ■ ■ ■ , c NO (k)} , (11) 

where the superscript T stands for the transposition. Here the first subscript of a correlation function is the atom 
number in the solute molecule and the second superscript designates the atom type in the water molecule. The 
structure matrix, W, of size Y is expressed via the interatomic distances, , 

w l3 = s ^pil. (12) 

The matrix V expresses the geometrical and correlational structure of the water, 



V 



2v 3 v A 



(13) 



where its components are, 

vi = l + phmi(k), v 2 = sin(fcr H H)/fc^HH + phna(k), ^ 

v 3 = sin(fcr H)/fc?'OH + ph n{k), v 4 = 1 + ph o(k)- 

Here Thh and toh are the interatomic distances in the water molecule, p is the density of water and /ihh, 'ioh and 
hoo are the total solvent -solvent correlation functions which may be found beforehand. 

To avoid divergences we-.apply the HNC closure relation (Q) using the renormalization procedure of the Coulomb 
potential proposed by Ngt3El, 

c&(r) = exp (-U tX (r)/k B T + f iX (r) + 7 &(r)) - 7 &(r) - 1, (15) 

where Uix{r) is the non-Coulombic part of the interatomic potential, the subscript i is the atom number in the 
solute molecule and index X refers to atoms O or H in the water molecule. Here the function fix is the renormalized 
Coulomb potential expressed through the charges and qx- 
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/*(r) = , er !! r) - (16) 

r/c B J 

The short-range parts of the correlation functions, which are designated by the superscript s above, are related to 
the complete correlation functions as follows, 

7«(*)=7«(*) "/«(*). c lX (k)=£tx(k) + f i x(k). (17) 

We also note that the introduction of the short-range correlation functions allows one to perform the Fourier trans- 
formation using the fast Fourier transform technique. 

III. THE NUMERICAL ALGORITHM 

In this section we propose an algorithm for numerical solution of the system of 2N integral equation (|^, |l5| ) for large 
values of N. We assume that the solvent density, p, the temperature /cbT, the matrices W and V and the interaction 
potentials appearing in the closure relation ( p"5| ) are given. 

First of all, let us introduce discretisation by choosing L grid points uniformly distributed in the r- and fc-spaces, 

ArAfc = p ri = iAr, kj = jAk, (i, j = 1 . . . L). (18) 

We denote by X the 2./VL-component vector, which approximates the set of functions, jfx(h) m jAk grid points, 

X T = [r m (Ak), . . . , Ym(LAk), . . . , % Q (LAk)] . (19) 
Then we introduce the discrete Fourier transformation operators T' and T h , 

F f = ^£fs, F> = ^£s, S^iMjUl (i,j = l,..;L). (20) 
Ak 2-k z Ar i L 

Eq. @ becomes a system of 2NL nonlinear algebraic equations, 

X - Z[X] = X - A {I 2N (g) Ff)C[(l 2N (» F b )X] - Q = 0. (21) 

Here A is a matrix of size 2iVL consisting of diagonal blocks of size L, which is the discrete analogue of the matrix 
V <g> W — i 2 N, and Q = (V ® W / )[/iH(fc), ■ ■ ■ , //vo(^)] T - F° r discretisation of the closure relation jl5| ) let us also 
introduce the function C[F], 

C l = C l [Y l ]=M l e X p{Y l )-Yi-l, (I = 1, . . . , 2NL), (22) 
where Mi are elements of a 27VL-dimensional vector M, 

M T = [exp(/m(r) - ^m(r)), . . . , exp(/,vo(r) - C^ro(r))] . (23) 

The simplest algorithm for solving the system of nonlinear algebraic equations (|2l]) would be the Picard method 
of direct iterations. Thus, given an initial approximation, X^°\ the improved solution is found from the recurrent 
relation, 

X {n+1) = Z[XW], (71 = 0,1,...). (24) 

The iterations should stop when the norm of X( n+1 "> — X( n *) becomes smaller than a predefined small value. Some 
modifications of the method of direct iterations are often utilized, which involve several previous iterations for providing 
a better convergence. A good example of such modifications, based on a vector extrapolation, was used in Ref. 
The direct iteration method for solving of the RISM equations was used in Ref. [l^, where the principle of "monotonic 
discrepancy" was applied to improve convergence. Unfortunately, in practice the convergence of the direct iteration 
method seems to be an exception rather than the rule. 

The Newton-Raphson algorithm appears to be much more efficient for solving equations (pl|). Here iterations are 
performed by the following scheme, 
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jl(n+l) = x(n) + A X(") 5 ( 25 ) 

AX^ = J- 1 [Z[X n ] - X n ), (26) 



where J is the Jacobi matrix obtained by differentiating Eq. (21) 



dZ 2 

J = l 2 NL zr = 12NL ~ T A (1 2 N ®S)D (1 2 N <8> S). (27) 

dX L 

The eiements of the diagonal matrix D are given by, 

Ai=Afiexp(r,)-l. (l = l,...,2NL). (28) 

The disadvantage of the Newton-Raphson method is that at each iteration step it is necessary to solve a system of 
linear equations (^6|) of size 2NL. It is a rather large system, since, even in the case of comparatively small N, the 
number of grid points L should be of order of thousands to achieve a good discretization. Despite that the convergence 
of the Newton Raphson iterations is rather fast, in practice the expenses for solving the system of linear equations 
with thepLacobi matrix of size 2NL are extremely high. 

Gillar£2l has suggested a more efficient algorithm which is essentially a hybrid of the Newton-Raphson and the 
Picard schemes for the 'coarse' and 'fine' parts of the solution respectively, the former being obtained as an expansion 
in the basis of so-called roof functions. In Ref. |2Tj it was noted that the expansion of the 'coarse' part as a truncated 
Fourier series yields even a better convergence. Thus, the main feature of both the Gillan and the LMV methods 
is that the solution is sought in a cycle of the Newton-Raphson steps for a small-size system of equations for the 
expansion coefficients of theJcoarse' part, and a consequent Picard refinement for the solution. We note, however, 
that this attractive schemetil is difficult to apply in practice for large molecules. Indeed, if the size of the system of 
equations for the 'coarse' part is taken relatively small, the consequent direct Picard iteration would not converge. If, 
on the contrary, this size is taken sufficiently large, the corresponding Jacobi matrix would not fit into the computer 
memory. 

Here propose a somewhat different strategy which could be viewed as the Newton-Raphson scheme with an ap- 
promixated Jacobi matrix. Namely, let us choose a subset of the grid points in the coordinate and k- spaces, 

r'^imAr, k = jAk, = 1 . . . -), (29) 

where m is an integer number. The Jacobi matrix calculated on this subset is denoted as J m and its size is 2NL/m. If 
we denote as k the complementary subset of components, we suggest to perform iterations according to the following 
formula, 

£(n) = x(n) _ z\X [n \ (30) 

X {n+1 \k) = X^{k) - J- 1 R^ n \k'), (31) 

X* n+1 >(fc") = A»(fc") - R^(k"). (32) 

Thus, the approximated inverse Jacobi matrix has the elements of the exact inverse to the reduced Jacobi matrix for 
the k' subset indices, and the unity matrix for the k" subset indices. Again, iterations are carried out until the norm 
of the vector Ry 1 ) becomes smaller than a predefined small value. 

The main feature of this particular scheme, which is an alternative to the Gillan and LMW methods, is that, 
in a sense, it combines the Newton-Rapson and Picard iterative schemes performed simultaneously. The reduction 
parameter to, which could be chosen as an integer power of 2, can be decreased during iterations if the convergence is 
slowing down, or can be increased if the convergence is improving. Practically, to achieve a good initial convergence, 
the size of the reduced Jacobi matrix 2 NL jm is still going to be rather large for the molecule we are going to study 
here. Fortunately, now this difficulty can be eliminated. 

For calculation of the inverse matrix in Eq. (^), i.e. solving a system of linear equations, we propose to apply 
the nonstationary iterative methods. Basically, these methods are versions of the conjugated gradients method. The 
main advantage of these methods is a better convergence for a wider class of matrices. Also, compared to the Gauss 
method, they do not require to keep the matrix of the system in the computer memory. Instead, it would suffice here 
to provide a procedure of matrix-vector multiplication, y — Jx, for arbitrary vector x. If the number of grid points L 
is equal to an integer power of 2, then one can use the fast Fourier transform technique for calculation of vector-matrix 
multiplications. It is straightforward to see that this procedure requires LN 2 floating point operations for large TV. 
The additional calculational efforts per. iteration in the nonstationary iteration methods are proportional to NL. For 
example, the Bi — CGSTAB methoct-3 requires two calculations of vector-matrix products, and an additional 48NL 
floating point operations. Overall, the memory requirements here are restricted to 20NL floating point units. 
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IV. NUMERICAL RESULTS 



In this section we study fragments of the double helix DNA formed by nucleotide pairs G:C. We have calculated 
fragments of different lengths (from 1 to 5 pairs), so that the number of atoms N varied from 63 to 315. 

We assumed the geometry of these fragments to be standardEa and the interaction potentials of atoms in different 
pairs in a fragment with water equal. According to Ref. ^3] the non-Coulombic part of the interaction potentials of 
DNA atoms with water atoms can be taken in the form, 

U ix (r) = A lX /r 12 ~ B tX /r 6 . (33) 

If an atom forms a hydrogen bond we use another expression for the potential, 

U iX (r) = A' lX /r^ - B' iX /r 10 . (34) 

The latter form is used in the AMBER force field potential parametrisation0. It is added to merely fine-tune the 
distances between atoms forming hydrogen bonds, which, in principle, are well enough represented by the Coulombic 
attraction alone. The potential parameters were taken from Ref. |2^ and the atomic charges were taken from Ref. |2^. 
As solvent we use water at the normal density 1 g/sm 3 and the temperature 25°C. Correlation functions of pure 
water were calculated using Eq. (^) in the HNC closure for the potential model TIPS with the parameters taken from 
Ref. |2^. The charge of the oxygen atom in the water molecule we consider to be equal to go = —0.86 and that of 
the hydrogen atom — to gn = — 9o/2- The interatomic distances in the DNA fragment were approximated up to the 
precision of 0.01 A to reduce the number of distinct elements in the matrix W. This allowed us to strongly reduce the 
requirements for RAM. Nevertheless, our calculations show that the correlation functions change rather insignificantly 
due to this round-off. The number of grid points and step size were taken equal to L = 1024 and Ar = 0.025 A 
The reduction parameter was set to m = 8, so that the size of the linear system is equal to 80640. The TFQMR 
nonstationary methocO was applied for solving the reduced system of linear equations. The precision parameters for 
solution of the nonlinear equations, e, and of the linear ones, S, were equal to 10 -5 and 10 -6 respectively. Calculations 
were carried out as follows. First, we calculate the correlation functions of one nucleotide pair. The output of this 
calculation is then used as input for calculation of two nucleotide pairs, and so on. 

In Tab. Q we present the computational expense of the problem. One can see that the computational time does not 
grow catastrophically with the increasing system size. The number of iterations, ]\f ITER j remains sufficiently low for 
fragments lengths considered here. This is because the calculational time is determined mainly by the time required 
for matrix-vector multiplications and it grows approximately as the second power of the reduced linear system size, 
2NL/m. 

In Fig. 1 we exhibit the calculated correlation functions, h(r) + 1, for some of the DNA duplex atoms with atoms 
of water. Heights of the first maximum of the correlation functions vary from 4 to 0.4. High peaks of correlation 
functions are observed for heteroatoms in bases and for the oxygen atoms in the sugar-phosphate backbone. Heights 
of the first peak, which are less than unity, correspond to those atoms which are buried in the DNA molecule and 
thus screened by other atoms from a direct contact with water molecules. 

Now, let us consider separately the hydration of bases, the sugar-phosphate backbone and also the characteristic 
features of hydration of the major and minor grooves of DNA. 



A. Hydration of bases 

The highest correlation peaks correspond to the nitrogen atoms of guanine N2 (see Fig. la) and N7 (Fig. lb). 
The position of the peak in the correlation function between the hydrogen of water molecules and N7 in guanine is 
shifted to the left compared to the peak of the correlation between the oxygen of water and N7. The positions of the 
peaks in correlations atom N2 — oxygen in water and atom N2 — hydrogen in water are nearly identical. Also, the 
height of the first peak in correlations of atom N2 — hydrogen is much smaller than that for the atom N7. According 
to the results from Monte Carlo simulationso this difference in behavior and degree of hydration of N7 and N2 is a 
consequence of that N7 atom is a proton acceptor, while N2 is a proton donor in the hydrogen bond. Obviously this 
results in different water molecule orientations around these atoms, and this is manifested in the considered correlation 
functions. The characteristic difference in the hydration of these atoms along the duplex chain is that the hydration 
of N7 does not change along the chain, whilst the hydration of N2 increases from ends to the centre approximately in 
1.5 times. 

The oxygen 02 in cytosine (see Fig. lc) participates in a hydrogen bond with an amino group of guanine through 
the atom 1H2. The oxygen 06 in guanine (Fig. Id) forms a hydrogen bond with an amino group of cytosine through 
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atom 2H4. The correlation functions of these two atoms are similar to each other. The characteristic feature of 
these functions is increasing of the first peak heights from ends to the centre analogously to N2 atom in guanine. 
Similarly, the oxygen atoms participate in forming hydrogen bonds with water as can be seen from shifted peaks in 
their correlation functions with the hydrogen of water molecules. 

The hydrogen atoms in amino groups of guanine, 2H2 (Fig. le) and 1H2 (Fig. If), and 1H4 (Fig. lg) and 2H4 
(Fig. lh) of cytosine differ significantly in their hydration behaviour. Protons participating in hydrogen bonds with 
oxygens 02 and 06 produce approximately twice lower height of the first correlation peak compared to that of the 
correlation functions for 2H2 of guanine and 1H4 of cytosine. This is not surprising since hydrogens 2H2 and 1H4 are 
open for water molecules, while 2H4 and 1H2 atoms hydrate weakly. 

Hydration mechanisms of Nl atom in guanine (see Fig. li) and N3 (Fig. lj) atom in cytosine, which both participate 
in hydrogen bonds between guanine and cytosine, are very different from each other. The distinction can be observed 
in the height of the first correlation peak with the hydrogen atom. N3 atom in cytosine possesses a large partial charge 
(approximately —0.7), resulting in a shift of the correlation peak with the hydrogen and in its comparatively large 
height. At the same time, the first peak in the correlation function with the oxygen is somewhat lower than unity for 
inner nucleotides of the DNA duplex. This means that, as in the case of the Nl atom, water weakly penetrates into 
the inter-planar space of neighbouring nucleotide pairs. 

N3 atom of guanine (Fig. Ik) is partially screened from water molecules by amino group atoms. Thus, it hydrates 
in a much smaller degree than N7 atom (Fig. lb), though atom N3 possesses a larger partial charge (—0.63) compared 
to the N7 atom (—0.54). N9 nitrogen atom in guanine (Fig. 11) and Nl atom in cytosine (Fig. lm) do not hydrate at 
all since they form a glycosidic bond. 

B. Hydration of the sugar— phosphate backbone 

Correlation functions of OIP atom of C residue on the 5' end are smaller than that in the other four nucleotide 
pairs (Fig. In). On the contrary, the 02P atom of C-nucleotide on 5' end possesses higher correlation peak compared 
to other pairs (Fig. lo). We can observe the same picture for oxygen atoms in the phosphate group of G-nucleotide 
residue (Fig. lp and Fig. lq). 

The phosphorus atoms in C and G residues hydrate nearly equally, except for the 5' end atoms, where the first 
correlation peak becomes broader (Fig. lr and Fig. Is). As for the hydration of deoxyribose, we should note that the 
ring atom 04* hydrates most strongly, the correlation peak on the 5' end being smaller for the G residue (Fig. It 
and Fig. lu). 03* oxygens in G and C residues do not hydrate except for atoms on 3' ends, which can be seen from 
Figs, lv, lw. Hydration of G 05* (Fig. lx) and C 05* (Fig. ly) atoms does not differ from each other. Unlike the 
03* atom (Fig. lv, lw), the hydration of end- and center- 05* atoms is the same. The rest of carbon and hydrogen 
atoms in the sugar-phosphate backbone hydrate rather weakly. 

C. Hydration of the major and the minor grooves 

Strong hydration observed for N7 (Fig. lb) and 06 (Fig. Id) atoms of guanine and N4 atom of cytosine (Fig. lz) 
qualitatively corresponds to the hydration regions Wl and W2 presented in Ref. ^?], while hydration of 02 atom in 
cytosine (Fig. lc) and N3 (Fig. Ik), N2 (Fig. la), 2H2 (Fig. le) atoms in guanine correspond to the region SI. 

On the other hand, as we have already mentioned, the first peaks in the correlation functions grow from ends of 
the duplex to its center for heteroatoms N2 (Fig. la), 06 (Fig. Id), N4 (Fig. lz) and 02 (Fig. lc) participating in 
hydrogen bonds between guanine and cytosine. On the other hand, it is known from the calculations of hydration for 
monoatomic nonpolar chainsliij and can be explained from the steric considerations, that the first peak of correlation 
function is much higher for end-atoms than for center-atoms. Such distinction in hydration behavior may be explained 
by two factors: first, by the presence of large partial charges on heteroatoms, and also by the geometrical structure 
of a Watson-Crick pair G:C. Presumably, this effect has a cooperative origin and is connected to the orientational 
ordering of water molecules in the grooves. 

The values of correlation functions calculated by us are in good agreement with the results of Ref. where the 
microscopic densities of oxygen and hydrogen atoms of water were calculated in the region contiguous to the major 
and the minor grooves of DNA using the mean-force potential formalism. 
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V. CONCLUSION AND DISCUSSIONS 



In the current work we have developed an efficient algorithm for solving the integral equations of the theory 
of liquids in the RISM approximation for studying hydration of macromolecules. The current treatment allows 
one to study hydration without applying any additional simplifying assumptions about the molecular structure of 
a macromolecule. The algorithm can be used to consider comparatively long (several dozens of nucleotide pairs) 
biologically functional DNA fragments using small computers. It can also be used for studying hydration of promoters 
and genome terminators. 

In particular, here we have studied the hydration of B-DNA fragments consisting of identical nucleotide pairs. Our 
results lead us to the conclusion that the hydration of nucleotide atoms in the DNA chain depends mainly only on the 
structure of its neighbouring environment, i.e. it is local by its nature. Our conclusions may as well be extended to 
an arbitrary sequence of G:C pairs. One expects that the hydration of the sugar-phosphate backbone depends very 
weakly on the particular sequence of base pairs. However, study of the dependence of the hydration of nucleotide 
pairs in the chain on the type of neighbouring pairs is beyond the scope of current paper. 

In our calculations we assumed a rigid geometrical structure of DNA duplex. There are reasons to believe that the 
hydration of a flexible or a bent double helix differs from that of a rod-like structure. We believe that the flexibility of 
five nucleotide pairs is small enough for these effects to appear. In principle, the RISM method allows one to calculate 
the mean-force potentials as function of a particular molecular geometry, which would finally permit study of the 
problem of the optimal DNA geometry in water by Monte Carlo methodlia. 

The reason we chose the length of the DNA duplex equal to five nucleotide pairs is that, as our calculations show, 
for this length the influence of the ends on the hydration of the central pair is very small. This can be clearly seen from 
the data, since in most cases the correlation functions of the penultimate and the central pairs are nearly identical. 
Thus, one can assume that the hydration of the central nucleotide pair in a short chain differs insignificantly from 
that of a nucleotide pair inside an infinitely long chain. We note that m—this respect the solvation of monoatomic 
nonpolar chains in the aqueous environment is analogous to that of DNAMI. 

We realize that the current molecular model is limited in many respects since the DNA molecule was electro-neutral 
here due to a parametric account of the phosphate screening. Nevertheless, the existing experience of calculations 
of DNA hydration by various methods indicates that this approximation appears to be quite adequate in the region 
restricted by the first coordination sphere&uO. The method used by us can be extended to include free mobile ions 
in the model and thus to study macromolecules in ionic solutions. 
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TABLE I. Computational expense for calculation of the DNA hydration. Here N G:C is the number of nucleotide pairs, N 
is the number of atoms in the fragment, N w is the number of distinct elements in the structure matrix W, pj TF Q MR [ s the 
number of iterations required for solving the linear system, N ITER \ s the total number of iterations for the given DNA fragment 
and Tpp ro is the computational time in minutes for a Pentium PRO 200 MHz and 64 Mb RAM computer. 
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FIG. 1. Plots of the correlation functions, h(r) + 1, of some atoms in the DNA duplex with atoms of water versus the radius, 
r, measured in Angstoms. In each figure the solid line corresponds to the correlation function of the atom with the hydrogen 
atom of water molecules and the dashed line with points corresponds to the correlation function of the atom with the oxygen 
atom of water molecules. In each row we present from left to right the correlation functions corresponding to the given atom 
in the first, second, third and fifth nucleotide pairs of the B-DNA duplex d(GGGGG)-d(CCCCC) starting from the 5' atom in 
guanine. The data for the forth pair is suppressed. 
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